Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

Reconstruction-based Outlier Detection

Reconstruction-based outlier detection

The reconstruction-based outlier detection methods identify outliers by measuring how well a data point can be reconstructed from a compressed or transformed representation of the original data. The core idea is that the normal points in original data set can be reconstructed with low error but outliers reconstructed will have high reconstruction error. Principal Components Analysis (PCA) is one of the reconstruction-based outlier dection methods.

For example: Let's image we have a dataset with two features, f1 and f2. f1 and f2 are highly correlated. If these data points are plotted in 2D space, most of the points lie along a diagonal line.

The Data points: (2,2), (3,4), (4,6), (5,8), (6,10) these are normal data. Consider another data point (11,3) this is outlier. We can see the normal points follow linear relationship as f2 nearly equal to 2 * f1 - 2. But outlier (11,3) does not fit this pattern.

Step-by-Step

  • First take the original data points
  • Then construct the compressed or transformed representation of the original data points
  • Then recontruct the original points from the compressed or transformed representation
  • Calculate error for all the original data points by comparing with their corresponding reconstructed data points
  • Normal data points will have low errors while outliers will have high errors.